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DNA methylation can control some CpG-poor genes but unbiased studies have not found a consistent genome-wide 
association with gene activity outside of CpG islands or shores possibly due to use of cell lines or limited bioinformatics 
analyses. We performed reduced representation bisulfite sequencing (RRBS) of rat dorsal root ganglia encompassing 
postmitotic primary sensory neurons (n = 5, r > 0.99; orthogonal validation p < 10~ 19 ). The rat genome suggested a 
dichotomy of genes previously reported in other mammals: low CpG content (<3.2%) promoter (LCP) genes and high 
CpG content (>3.2%) promoter (HCP) genes. A genome-wide integrated methylome-transcriptome analysis showed that 
LCP genes were markedly hypermethylated when repressed and hypomethylated when active with a 40% difference 
in a broad region at the 5' of the transcription start site (p < 10" 87 for -6,000 bp to -2,000 bp, p < 10" 73 for -2,000 bp 
to +2,000 bp, no difference in gene body p = 0.42). HCP genes had minimal TSS-associated methylation regardless of 
transcription status, but gene body methylation appeared to be lost in repressed HCP genes. Therefore, diametrically 
opposite methylome-transcriptome associations characterize LCP and HCP genes in postmitotic neural tissue in vivo. 



Introduction 

DNA methylation is recognized as molecular mechanism in 
retaining cellular identity (tissue specific methylation signatures) 
and as a co-determinant of gene activity. In adult mammalian 
tissues cytosine residues preceding a guanine, the CpG dinucleo- 
tide motif, is the primary target of DNA methylation. In well- 
characterized examples, CpG methylation is inversely correlated 
with gene expression, such as in repression of transposable ele- 
ments 1 and tumor suppressor genes. 2 Previous studies indicate 
that mammalian promoters can be divided into two distinct 
classes determined by CpG content: high CpG promoters (HCP) 
and low CpG promoters (LCP). 3 " 6 Elango et al. suggested that 
this distinction developed during early vertebrate evolution and 
is a characteristic of mammalian genomes. 3 Gene ontology analy- 
ses further suggested a functional bimodality whereby LCPs are 
strongly associated with tissue-specific genes and HCPs primarily 
regulate house-keeping genes. 6 ' 7 Others have used a three-tiered 
system of dividing promoters by CpG content, namely low, inter- 
mediate and high CpG promoters. 8 " 10 The classic model holds 
that DNA methylation suppresses transcription by targeting 
CpG-rich regions termed CpG islands (CGI). Evidence for this 
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focus on CpG-rich regions can be found in many older studies 
such as Stein et al., 11 Busslinger et al. 12 or Futscher et al. 13 Several 
recent reports on methylation in promoter regions concluded sim- 
ilarly that the inverse correlation between methylation and tran- 
scription is found only in promoters containing CGI, i.e., HCPs, 
while no correlation could be found at LCPs. 91014 Differing mod- 
els have been presented on how methylation and transcription 
are related in CpG-poor regions. In a recent review, Pelizzola and 
Ecker concluded that DNA methylation at CpG poor promoters 
cannot predict expression of the downstream gene. 15 Hughes et 
al. found that the CpG density in repressive DNA methylation is 
significantly lower compared with DNA methylation that is not 
repressive. 16 Studies comparing methylation patterns in different 
organs found tissue differentially methylated regions (tDMR) 
most often outside of CGI, i.e., in CpG-poor regions. 717 " 21 In 
cases of single genes or selected gene subgroups, an inverse cor- 
relation of LCP methylation and gene activity was shown. 1719 
Rakyan et al. 7 provided a tDMR study employing methylated 
DNA immunoprecipitation (MeDIP) that included a transcrip- 
tome comparison and concluded that methylation controlled the 
activity of LCP genes contradicting the studies discussed above 
such as Weber et al. 10 Considering such opposing conclusions in 
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even the most recent literature, the question remains unsettled 
whether and how LCP genes are controlled by DNA methylation. 

Studies mapping DNA methylation in the CNS found that 
the neural methylome is unique. 22 ' 23 Ghosh et al. for example, 
showed tissue-specific CpG island methylation distinguishing 
neural from non-neural tissue. 22 Performing restriction land- 
mark genomic scanning (RLGS) they identified 34 differen- 
tially methylated CpG islands, which revealed neural specific 
CpG island hypermethylation compared with mesoderm- or 
endoderm-derived tissues. DNA methylation plays a critical role 
in neurological disorders such as Rett Syndrome 24 or early life 
stress 25 and as important regulator in the CNS, such as in mem- 
ory formation. 26 Most studies on the role of DNA methylation in 
the nervous system have analyzed the brain. No comprehensive 
analysis of DNA methylation is available in the peripheral ner- 
vous system (PNS). 

The rat ranks among the most important laboratory animals 
but few methylome studies have been performed in this species. 
None has interrogated the rat genome for different gene classes 
that may be relevant for understanding methylome-transcriptome 
associations. We analyzed the distribution of the CpG motif in 
rat promoters and found evidence that the model of dichotomiz- 
ing genes into two classes, LCP and HCP genes, should apply 
to the rat. Nucleotide-resolution methylation data at genome- 
scale is thus far not available for the rat or for the peripheral ner- 
vous system (PNS) of any other species. We performed reduced 
representation bisulfite sequencing (RRBS), 5 a technology for 
interrogating hundreds of thousands of methylation sites with 
digital precision 27,28 on the rat PNS and found characteristic and 
highly significant patterns relating gene methylation and activity 
(p - 1.0 x 10" 74 ) that differed markedly between LCP and HCP 
genes. 

Results 

RRBS of genomic DNA from the L4 DRG provided informa- 
tion on 16 x 10 6 cytosines including 2.8 x 10 6 CpG sites. DNA 
methylation was found principally at cytosines in CpG sites, the 
known methylation motif (Fig. SI A). <1% of cytosine methyla- 
tion occurred at CpH sites with a preference for the CpA motif 
(Fig. SIB). Methylation levels at CpG sites followed a bimodal 
distribution similar to previously reported bisulfite conversion- 
based studies in other tissues. 29 " 31 Over half of the CpG sites 
were unmethylated, a fifth appeared completely methylated, 
and the remaining sites were methylated at an intermediate level 
(Fig. S2). 

RRBS quantification proved highly reproducible among five 
independent biological replicates. The Pearson correlation coeffi- 
cient was > 0.988 for comparison of individual CpG methylation 
levels among all possible pairings. 

An independent technology, the Hpall tiny fragment 
Enrichment by Ligation-mediated PCR (HELP) assay, was per- 
formed as orthogonal validation. HELP detects methylation 
status using a pair of a methylation-sensitive- and a methylation- 
insensitive restriction enzyme, Hpall and Mspl. 32 The protocol 
steps of HELP and RRBS are mutually exclusive. Of 7,738 sites 



with high and low methylation states that were unambiguously 
informative in both assays, 82% were concordant, which was 
highly significant with p < 10" 19 (Table SI). 

CpG dinucleotides are unevenly scattered across the genome. 
We found that CpG density in the rat genome peaked in the pro- 
moter region around the TSS consistent with findings in other 
genomes and with the observation that many promoters over- 
lap with CGI (Fig. 1A). Next, we examined how the feature of 
promoter CpG density varied throughout the genome of the rat. 
Specifically, we wished to determine whether promoter CpG den- 
sity was bimodally distributed in the rat like it is in the human 
genome. 6 Taking the region from -500 bp to +500 bp around the 
TSS as a proxy, a bimodal distribution was noted, indicating two 
distinct groups (Fig. IB). Dichotomizing the genes with a cutoff 
at a CpG density of 3.2% led to the classification of low CpG 
content (<3.2%) promoter, "LCP," genes and high CpG content 
(>3.2%) promoter, "HCP," genes (Fig. IB and C). Of 17,602 
protein-coding genes included in this study, 8,644 were LCP 
genes and 8,958 were HCPs. 

Sub-grouping genes by CpG density into more than two strata 
did not alter subsequent analyses further supporting the dichoto- 
mous classification. Analyzing LCP and HCP genes as two enti- 
ties (rather than all genes together), however, was an important 
distinction. 

Measurements of gene activity were available from our previ- 
ous study, which had applied another massively parallel sequenc- 
ing technique to the DRG, RNA-seq, validating the method 
extensively with qPCR. 33 

In LCP genes, transcriptional activity was linked to the meth- 
ylation of CpG sites located at the TSS and within up to 8,000 
nucleotides 5' of the TSS, a region commonly implicated in gene 
regulation (Fig. 2A). The most highly expressed genes had a mean 
level of CpG methylation of 0% in this upstream region, while 
it was >40% for silenced genes (Fig. 2A). For this comparison, 
the mean % of methylation was "trimmed" by removing the high- 
est and lowest l/5th of values before averaging ("20% trimmed 
mean"), a common method in descriptive statistics to capture 
the middle ("central tendency") of data sets that do not follow a 
normal distribution. We then performed a separate comparison of 
the top l/5th of values by determining the 80th percentile rank 
of methylation levels. Here the difference in methylation levels 
between highly expressed and silenced LCP genes became even 
more evident: for highly expressed genes, the 80th percentile rank 
of methylation was <10% in a region from -1,000 bp to -3,000 
bp from the TSS, while it was >90% for silenced genes (Fig. 2B). 
These findings for LCP genes are consistent with the classic model 
of increased methylation at promoters associated with decreased 
transcription. However, in the rat DRG this correlation was exclu- 
sively found in LCP genes suggesting that the repressive function 
of DNA methylation was most effective in shutting down genes in 
regions in which its target motif, the CpG dinucleotide, is sparse. 

LCP genes appeared highly methylated downstream of the 
TSS, i.e., in the region of the gene body whether or not they were 
expressed. Because a positive correlation of gene body methylation 
and expression was reported by others, we addressed this question 
further in a complementary analysis binning CpG methylation 
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Figure 1. Promoter CpG density defining the dichotomy of LCP vs. HCP genes in the rat genome. (A) Unselected rat genes are (in aggregate) character- 
ized by a high density (frequency) of the CpG motif around the TSS (0 bp). Shown is the CpG density of 17,602 protein-coding genes included in the 
main analysis. Bins with a width of 500 nucleotides are shown. CpG density peaked in a narrow region of 1,000 to 2,000 nucleotides around the TSS. 
CpG density varied, however, considerably between genes as demonstrated by box plots indicating the 5th, 25th, 50th, 75th, 95th percentile. 
(B) The promoter CpG content of individual genes was bimodally distributed among the total set of genes indicating two distinct classes of promoters. 
The CpG content of the core promoter region was determined by choosing a 1,000 bp interval around the TSS (from -500 bp to +500 bp) as a proxy. 
Promoter CpG content varied among individual genes from <0.5% to >10%. Depiction of the promoter CpG content as histogram demonstrated two 
peaks at 1%and 5.5% suggesting a mixed distribution resulting from two distinct underlying populations. The position of the valley suggested a 
cutoff at 3.2% (vertical red line). The resulting dichotomization of genes provided a classification of "LCP" and "HCP" genes resembling that originally 
proposed by Saxonov et al. 6 which guided subsequent analyses. (C) CpG density in LCP genes was low not only — as expected — at the TSS but also 
throughout the remaining gene regions suggesting that there were no unrecognized regions of higher CpG density, CGI, farther away from the TSS. 
CpG density of HCP genes was high at the TSS reflecting how they were defined. 
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Figure 2. Diametric methylome-transcriptome relationships in LCP vs. HCP genes. (A) LCP genes (20% trimmed mean): mean CpG methylation levels 
are shown for highly expressed (red) and repressed (green) LCP genes (low CpG content promoter genes containing <3.2% CpG). Shown is the 20% 
trimmed mean in a 1,000 bp-wide moving window. CpG sites located at the TSS and within several thousand nucleotides 5' of the TSS differed mark- 
edly between highly expressed and repressed LCP genes. Differences were highly significant in the region -6,000 bp to -2,000 bp with p -3.7 x 10" 88 
and in the region -2,000 bp to +2,000 bp with p -1.0 x 10 74 . In the region +2,000 bp to +6,000 bp there was no significant difference with p = 0.42. 
(B) LCP genes (80th percentile): The 80th percentile rank of CpG methylation levels supported the same observation demonstrating hypomethylation 
of highly active genes and hypermethylation of silenced genes 5' of the TSS. Methylation downstream of the TSS was high in LCP genes regardless 
of gene activity. (C) Gene body methylation in LCP genes: CpG methylation was similar in the gene bodies of highly expressed and silent LCP genes. 
Shown are boxplots for exons (E) and introns (I) indicating the 10th, 25th, 50th, 75th and 90th percentile rank of methylation levels for each gene 
group. (D) HCP genes (20% trimmed mean): Mean CpG methylation levels of highly expressed and repressed HCP genes showed the characteristic 
deep valley of hypomethylation around the TSS, which is the region of high CpG motif density defining the HCP gene group. Methylation of silenced 
genes appeared to be only minimally higher at the TSS. (E) HCP genes (80th percentile): The 80th percentile rank of CpG methylation levels further 
supported the observation that the TSS of HCP genes remained poorly methylated regardless of the level of gene activity. Highly expressed HCP 
genes were marked by methylation outside of the TSS while the silenced HCP genes appeared to be relatively hypomethylated throughout the whole 
gene. (F) Gene body methylation in HCP genes: CpG methylation differed between the gene bodies of highly expressed and silent HCP genes. Differ- 
ences were highly statistically significant with p~2.6 x 10 156 for exons and p~1.2 x 10 75 for introns. 



level data for exons and introns (first exon excluded from the exon 
bin because it overlaps the TSS). This comparison confirmed the 
above by showing that in LCP genes methylation of introns and 
exons was high regardless of expression levels (Fig. 2C). 

In HCP genes the TSS was only minimally affected by meth- 
ylation, remaining essentially unmethylated regardless of the 
level of gene expression (Fig. 2D and E). Outside of the TSS 
region highly expressed HCP genes were highly methylated and 
repressed HCP genes were hypomethylated. High methylation 
levels extended several thousand nucleotides on both sides of 
the TSS including the gene body. As with the LCP genes above, 
the main findings for HCP genes were apparent by comparing 
the two subgroups of genes (highly expressed and silent) using 
the trimmed mean % of methylation (Fig. 2D). The subgroup 
differences appeared even more clearly when the top l/5th of 
methylation values was compared using the 80th percentile rank 
(Fig. 2E). 

However, in HCP genes differences in gene body methyla- 
tion were linked to expression. Introns and exons were poorly 



methylated in repressed genes and highly methylated in active 
genes (Fig. 2F), a relationship consistent with the hypermethyl- 
ation of transcribed sequences reported previously in references 
8, 29, 30 and 34. 

Discussion 

A sequencing-based integrated, genome-wide methylome- 
transcriptome analysis was performed in the DRG, a post- 
mitotic neural tissue harvested ex vivo without subsequent in 
vitro culture. Tissue was from the rat, an important laboratory 
animal for which a genome-wide DNA methylation map with 
nucleotide resolution was previously lacking. We used RRBS, 
a technique that was validated by several others 35 and may rep- 
resent the state of the art in the DNA methylation field. We 
added our own orthogonal validation with another genome- 
wide methylome analysis tool, the HELP assay, proving highly 
significant agreement for CpG sites covered in both assays 
(p < 10" 19 ). While many other RRBS studies were limited to two 
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replicates per tissue we analyzed 5 biological replicates demon- 
strating very high reproducibility. Taken together, the reported 
data therefore represents a technically faithful representation of 
the methylation levels of over 2.8 x 10 6 CpG corresponding to 
approximately 6% of such sites in the rat genome (CpG-level 
data will be available through the online rat genome database 
RGD as described in methods). 

Examination of the rat genome sequence suggested two pro- 
moter types defined by CpG density: LCP and HCP. The dis- 
tinction between LCP and HCP genes based on the bimodality 
of the underlying distribution of promoter CpG density was 
originally proposed for the human genome by Saxonov et al. 6 and 
has been used recently in the analysis of genome-wide studies of 
human promoter control by histone modification. 36 

LCP and HCP genes are similarly frequent in the mam- 
malian genome. Consequently, mechanisms affecting only one 
group may be diluted by noise from the other or will be can- 
celled out entirely, if all genes are analyzed as a single universe. 
Another incremental improvement to the analysis was the use of 
a trimmed mean and percentile ranks when comparing aggregate 
methylation levels between two groups of genes. In our case, the 
analysis of all genes combined using the untrimmed mean (aver- 
age) showed methylation effects at the TSS (Fig. S3) resembling 
the depictions in other recent reports, 8 ' 29 ' 30 ' 37 while obscuring 
important insights. However, analyzing LCP and HCP genes 
as distinct entities demonstrated different associations between 
gene methylation and activity in each group (Fig. 2) . The main 
findings of our study therefore include not only biological obser- 
vations but also suggestions toward refining standard analyses 
of genome-wide methylome-transcriptome data sets. For LCP 
genes our data clearly showed that there was an inverse corre- 
lation between promoter methylation and gene activity. Our 
findings contribute to a body of literature, which has not found 
agreement to date on the issue whether or not DNA methylation 
acts on CpG-poor promoters. Our findings agree with studies 
by Eckhardt et al., 17 Han et al. 19 and Hughes et al. 16 Eckhardt 
et al. 17 showed that an inverse correlation between methylation 
and transcription for a single low CpG promoter gene, oncostatin. 
Han et al. showed this relationship for the two tissue-specific 
CpG poor promoters, LAMB3 and RUNX3P Our findings for 
LCPs are also in alignment with studies focusing on t-DMRs, 
which found that hypermethylated t-DMRs associated with 
CpG poor regions (many but not all at promoters) are linked to 
gene repression. A comprehensive RRBS study by Meissner et 
al. included data on LCP promoters showing that in some cases 
methylation changed during tissue differentiation correlating 
with activating or repressing histone marks. 36 On the other hand, 
many reports found that LCP promoters are not targeted or not 
affected by DNA methylation, e.g., Weber et al. 10 and Koga et 
al. 16 reported that LCP genes were expressed regardless of pro- 
moter methylation and a recent review article reached a similar 
conclusion. 15 Some of the discrepancies in the literature may be 
related to experimental design, e.g., both of the above studies 
relied on a lower-resolution assay, MeDIP, which performs less 
sensitively in CpG-poor regions. Weber et al. 10 used RNAP II 
occupancy as proxy for expression, which may be problematic 



because RNAP II can also bind to inactive genes. Our study is 
not affected by either of these potential pitfalls. Our experiments 
provide nucleotide resolution data on more CpG sites in LCP 
genes than any of the above-cited reports. By design our study 
was not limited to selected LCP genes but executed an unbi- 
ased and genome-wide analysis strengthening the case that gene 
methylation and activity are linked at LCP promoters. 

Our findings for HCPs are consistent with the observation in 
many prior studies that the TSS (of HCPs) is typically unmeth- 
ylated even in inactive genes. We studied a healthy, postmitotic 
tissue. Therefore, our findings do not disagree with the many 
well-established cases of HCP gene shutdown by promoter hyper- 
methylation occurring in cancer or other diseases (as discussed 
in the introduction). We found that there was minimal if any 
difference in methylation in HCPs in the TSS region in the rat 
DRG. At the same time our data suggests a positive correlation 
between gene activity and the methylation of the body region 
of HCP genes, i.e., exons (p ^2.6 x 10" 156 ) and introns (p -1.2 x 
10" 75 ). This is consistent with the original report of this observa- 
tion by Zilberman et al. 34 in A. thaliana and subsequent studies 
in a variety of human embryonic and adult cell types. 8 ' 29 ' 30 In our 
study, the observation did not extend to LCP genes. This dissimi- 
larity between LCP and HCP genes appears not related to CpG 
density in the gene body, which is similar in both gene groups as 
shown in Figure 1C. 

Furthermore, close examination of our data on HCP genes 
suggested the possibility that there might also be a positive cor- 
relation between gene activity and DNA methylation of a several- 
thousand nucleotide wide region upstream of the TSS, a topic 
that may warrant further exploration. 

The main finding of our study was that the LCP vs. HCP 
distinction is strongly predictive of different methylome-tran- 
scriptome relationships. It is important to note, that similar to 
other studies emphasizing a genome-wide, agnostic approach our 
data demonstrates correlations between gene methylation and 
activity but does not establish causal relationships. At the same 
time, other genome biology studies suggest that the methylome 
differences observed here between HCP and LCP may indeed 
be mechanistically linked to transcription through control of 
chromatin. A recent report on another epigenetic mechanism, 
histone post-translational modification, found that different his- 
tone modifications predicted the activity of LCP and HCP genes 
(H3K4me3 and H3K79mel vs. H3K27ac and H4K20mel). 36 
Taken together, these and our results support the model that 
alternate regulatory mechanisms are active in LCP and HCP 
genes. 

How DNA methylation (co-) determines (or is associated 
with) gene activity may also depend on the tissue or experimental 
system under study. For instance shutdown of HCP promoters by 
DNA methylation is common in tumors but (as discussed above) 
may not be operative in normal tissues such as the PNS investi- 
gated here. Furthermore, it has been shown that in vitro culture 
of cells for as few as nine passages can induce new, non-random 
patterns of DNA methylation, 5 which may be critical for some 
other studies. DRG neurons have lost the ability to divide. The 
present study may therefore be most representative of postmitotic 
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tissues or may highlight features characteristic of the nervous 
system. 

Methods 

Tissue samples. Dorsal root ganglia (DRG) from the L4 level 
were harvested from adult male Sprague Dawley rats, flash- 
frozen and immediately stored at -80°C. DRG used for HELP 
assay analysis (n = 5) and for RRBS studies (n = 5) were har- 
vested from two groups of neurologically intact rats and com- 
pared with the control group of a previous study, 33 which was 
also neurologically intact (i.e., the group without a nerve liga- 
tion), in order to match the condition under which the existing 
transcriptome data set was obtained. All animals were obtained 
from Harlan Laboratories, Inc. All procedures were approved by 
the Institutional Animal Care and Use Committee. 

Reduced representation bisulfite sequencing (RRBS). DNA 
extraction was performed using the spin columns (Qiagen) 
according to the manufacture instructions. One |xg of genomic 
DNA from the L4 DRG of each of 5 adult male Sprague Dawley 
rats was digested with Mspl. Restriction ends were blunted, 3' 
adenylated and ligated to pre-annealed forked Illumina adaptors 
containing 5'-methyl-cytosines. For each of the 5 samples, two 
library size ranges 150-175 bp and 175-225 bp, including adap- 
tor length, were isolated from a 2% agarose gel. Subsequently, 
samples were treated with the EpiTect Bisulfite kit (Qiagen) 
extending the bisulfite conversion time beyond the manufactur- 
er's protocol to 14 h by adding 3 cycles of 95°C x 5 min, 60°C 
x 180 min. Purified samples were subjected to 18 cycles of PCR 
and gel-purified. 50 bp sequences were obtained on an Illumina 
GA IIx genome analyzer. 

Sequence read alignment and calculation of cytosine 
methylation levels. Sequence reads were mapped in the three- 
nucleotide space (A, G, T) to Mspl fragments predicted from 
the forward and reverse strand of the rat reference genome (rn4) 
using the stand_alone_Eland_extended module. Sequence read 
alignment performed with an alternative aligner, Bowtie, 38 
yielded identical alignments for the large majority of reads and 
indistinguishable results in all downstream analyses. Cytosines 
were then scored in each read in a binary fashion as methyl- 
ated or unmethylated according to the sequencing result in the 
respective position-C indicating methylation (base protected 
from bisulfite conversion) and T indicating lack of methylation 
(cytosine base converted to uracil by bisulfite reaction and sub- 
sequently amplified as thymidine by PCR). Read-level data was 
then aggregated in a MySQL database. A minimum of ten reads 
was required for inclusion of a cytosine in subsequent high-level 
analyses, which were performed using the statistical program 
package R. The average read depth for the CpG sites included in 
the analysis was 503-fold. 

High-level methylation analysis of gene regions. Genes were 
grouped according to the criteria detailed in the figure legends 
and aligned according to their annotated TSS. A rectangular 
sliding window of 1,000 bp width without further smoothing 
was moved across a region from -8,000 to +8,000 bp relative to 
the TSS. Methylation levels of all CpG (covered by the RRBS 



experiment with at least ten reads) were grouped together to 
calculate the mean methylation level and percentile ranks of 
methylation levels at each nucleotide position. Statistical mea- 
sures commonly used to describe differences between normally 
(Gaussian) distributed data such as the average are ineffective 
in capturing important differences in data sets such as of CpG 
methylation, where relatively few data points are near the average 
and most data points cluster at the extremes. (The distribution of 
methylation levels resembles a (3 -binomial distribution). In such 
cases it is important to understand, which portion of the data is 
affected when a difference is observed. To clarify why the aver- 
age (mean) may not be a good measure, we could picture a group 
of CpG methylation levels as people with different incomes in a 
country with great social disparity: Many are living in extreme 
poverty earning almost nothing (methylation = 0), some are very 
rich (methylation = 1.0) and the middle class is underrepresented 
(few methylation levels from 0.2 to 0.8). The "average" (income 
or methylation level) represents the ratio of the data points at the 
extremes and falls into a range, where very few individual data 
points lie. To capture the data in a more meaningful way we used 
percentile ranks. All genomic areas analyzed encompassed gener- 
ally at least 10% of unmethylated CpGs, i.e., the 10th percentile 
rank is a flat line across all regions of most genes regardless of 
expression level or promoter CpG content and is therefore unin- 
formative. Accordingly, the 20% trimmed mean was depicted in 
the figures to capture the middle section of the data set. The high 
percentile ranks were highly informative. Therefore, the 80th 
percentile is shown in the figures. 

Significance testing. Differences in the methylation levels 
were assessed for gene regions. At each nucleotide position within 
the region the 20% trimmed mean was determined for each gene 
group resulting in a value pair (one value each for the repressed 
and highly active gene group). For instance in a 2,000 nucleotide 
wide region up to 2,000 such value pairs could be formed, while 
only those nucleotide positions were included for which data was 
available in both gene groups. A paired, two-sided t-test was then 
performed. 

Gene expression levels. Gene activity was determined from 
publicly available raw read data of our recent DRG transcriptome 
study, 33 which had used RNA-seq on poly-A purified RNA from 
the rat DRG. Gene expression levels were quantified as reads 
aligned to specific genes normalized for the total number of reads 
aligned in each sample. Repressed genes were defined as those 
lacking reads. Active genes were divided into five groups of equal 
size according to expression levels, whereby the top group was 
referred to as highly active genes. 

HELP assay. The Hpall tiny fragment Enrichment by Ligation- 
mediated PCR (HELP) assay was performed as described in the 
original report, 32 employing a published rat-specific Nimblegen 
oligonucleotide array design. 39 In brief, genomic DNA from each 
DRG (n = 5) was divided into two aliquots, which were subjected 
to restriction digestion with Mspl or Hpall. These enzymes are 
isoschizomers recognizing the same sequence motif, 5 -C'CGG- 
3', while only Hpall is inhibited by methylation of the CpG in 
the center of the motif and Mspl is always active independently 
of the cytosine methylation state. The pattern of restriction 
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fragments generated with Mspl and Hpall will therefore be 
different reflecting the methylation state of 5 -CCGG-3' sites 
throughout the genome. To detect the differences, restriction 
fragments were amplified by ligation mediated PCR following 
published protocols 32 and subjected to competitive hybridization 
on a custom rat oligonucleotide array. 39 Hybridization was per- 
formed at the Nimblegen contract research laboratory, which was 
accessed through the genomics core facility contract services at 
Albert Einstein College of Medicine, where the HELP assay was 
originally developed. 39 Program scripts from a published HELP 
assay bioinformatics pipeline available through bioconductor 40 
were utilized for processing HELP data. The threshold for frag- 
ment detection was set at 2.5x background fluorescence for Hpall 
and Mspl hybridizations. Fragments with lower fluorescence in 
the Mspl samples were non-informative. Informative fragments 
were included in the validation analysis for the above RRBS assay 
if the CpG sites flanking the HELP assay fragment were also 
assayed by RRBS. CpG sites were scored as methylated in the 



HELP assay if the respective fragment was detected only in the 
Mspl sample and as unmethylated if it was detected in both the 
Mspl and the Hpall sample. 
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